[fix][broker] fix delayedMessagesCount error in InMemoryDelayedDeliveryTracker by TakaHiR07 · Pull Request #25076 · apache/pulsar

TakaHiR07 · 2025-12-16T02:41:57Z

Motivation

Occur a NPE issue in delay message. And then find the reason is in delayedMessagesCount. When InMemoryDelayedDeliveryTracker#addMessage(), it don't judge whether the entryId is exist in roaringbitmap, that result in the delayedMessagesCount of the map size is not correct.

企业微信截图_bb4f30f2-8829-413d-a7e8-e8746dc07adc

Modifications

add test to test the duplicate entry case
check whether roaring64Bitmap contains entryId
log error for the case of "n < 0" in getScheduledMessages(), since this case should not occur

Verifying this change

Make sure that the change passes the CI checks.

Documentation

doc
doc-required
doc-not-needed
doc-complete

lhotari · 2025-12-16T08:10:57Z

...ar-broker/src/main/java/org/apache/pulsar/broker/delayed/InMemoryDelayedDeliveryTracker.java

+                .computeIfAbsent(timestamp, k -> new Long2ObjectRBTreeMap<>())
+                .computeIfAbsent(ledgerId, k -> new Roaring64Bitmap());
+        if (!roaring64Bitmap.contains(entryId)) {
+            roaring64Bitmap.add(entryId);


It looks like .addLong should be used in the Roaring64Bitmap API.

Suggested change

roaring64Bitmap.add(entryId);

roaring64Bitmap.addLong(entryId);

The .add method works too, but the method signature takes a long array (long...). Perhaps the compiler is able to optimize that, so it might not make a difference.

It's unfortunate that Roaring64Bitmap doesn't have the checkedAdd method as there is in RoaringBitmap. That would eliminate the need for the .contains check.

Yes, unfortunately.

"add" or "addLong" I guess is the same after compile, sure it may be better to use "addLong" directly.

Besides, I think there is no need to consider concurrent situation in InMemoryDelayedDeliveryTracker. I don't see any code point out that concurrent situation would occur.

Threading question: addMessage() and getScheduledMessages() are invoked under synchronized (this) in the dispatcher (e.g. PersistentDispatcherMultipleConsumers#trackDelayedDelivery), but clearDelayedMessages() doesn’t seem synchronized and InMemoryDelayedDeliveryTracker#clear() isn’t synchronized either.

Is clear() guaranteed to be called under the same lock, or should we align with BucketDelayedDeliveryTracker#clear() (synchronized) to avoid concurrent access to delayedMessageMap/bitmaps?

@TakaHiR07 Please check the previous comment by @Denovo1998.

lhotari · 2025-12-16T08:15:08Z

btw. this code location was discussed in the review: https://github.com/apache/pulsar/pull/24430/changes#r2156278377

Denovo1998

Left some comments.

Denovo1998 · 2025-12-16T13:38:12Z

...ar-broker/src/main/java/org/apache/pulsar/broker/delayed/InMemoryDelayedDeliveryTracker.java

        updateTimer();

        checkAndUpdateHighest(deliverAt);


One thought: should updateTimer() and checkAndUpdateHighest(deliverAt) run only when we actually insert a new entryId?

With the current structure, duplicate addMessage() calls still update highestDeliveryTimeTracked / messagesHaveFixedDelay, which could disable the fixed-delay optimization even though the tracker state didn’t change.”

Great catch. I have checked the earliest fixed delay implementation in #16609. In the earliest implementation, the issue is already exist. When duplicate addMessage(), highestDeliveryTimeTracked is ok, but messagesHaveFixedDelay would be set to false incorrectly.

I would check why exist duplicate addMessage() later. And I prefer that we open another pr to fix the additional issue.

Denovo1998 · 2025-12-16T13:40:01Z

...ar-broker/src/main/java/org/apache/pulsar/broker/delayed/InMemoryDelayedDeliveryTracker.java

+                    log.error("[{}] Delayed message tracker getScheduledMessages should not < 0, number is: {}",
+                            dispatcher.getName(), n);


About the new n < 0 branch: this should be unreachable in normal flow. One potential way to hit it is int overflow from int cardinality = (int) entryIds.getLongCardinality().

Would it be better to keep cardinality as long (and compare cardinality <= (long) n) to eliminate overflow, instead of only logging when n < 0?

You are right. I think it is another issue and both use long value is better. Do you think we fix it in this pr or you push another pr to fix?

Denovo1998 · 2025-12-16T13:44:41Z

pulsar-broker/src/test/java/org/apache/pulsar/broker/delayed/InMemoryDeliveryTrackerTest.java

+
+
+
+        // case2: addMessage() with duplicate entryId,


In case2 the comment says it enters cardinality > n, but with getScheduledMessages(10) and 4 unique entryIds it should hit the cardinality <= n branch. Could we adjust the comment to match the scenario (case3 seems to be the one exercising cardinality > n)?

Yes. have changed the comment

Denovo1998 · 2025-12-16T13:50:04Z

btw. this code location was discussed in the review: https://github.com/apache/pulsar/pull/24430/changes#r2156278377

@thetumbled Do you have a chance to review the current PR?

thetumbled · 2025-12-17T02:26:32Z

btw. this code location was discussed in the review: https://github.com/apache/pulsar/pull/24430/changes#r2156278377

Maybe we should figure out the reason why duplicate entry IDs are added multiple times if this class does not intentionally allow that behavior.

Copilot

Pull request overview

This pull request fixes a bug in the InMemoryDelayedDeliveryTracker where duplicate message entries could cause incorrect delayed message counts, potentially leading to NPE issues. The fix adds a duplicate check before incrementing the counter and improves error handling for edge cases.

Key changes:

Add duplicate entry check in addMessage() using Roaring64Bitmap.contains() before adding entries
Improve error handling in getScheduledMessages() to explicitly handle and log the n < 0 case
Add comprehensive test coverage for duplicate entry scenarios across multiple test cases

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
pulsar-broker/src/main/java/org/apache/pulsar/broker/delayed/InMemoryDelayedDeliveryTracker.java	Implements duplicate entry check in `addMessage()` and enhances error handling in `getScheduledMessages()`
pulsar-broker/src/test/java/org/apache/pulsar/broker/delayed/InMemoryDeliveryTrackerTest.java	Adds comprehensive test method `testDelayedMessagesCountWithDuplicateEntryId()` covering three scenarios: multiple timestamps with duplicates, single timestamp with duplicates, and partial retrieval with duplicates

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Dec 16, 2025

Technoboy- added the ready-to-test label Dec 16, 2025

Technoboy- assigned TakaHiR07 Dec 16, 2025

Technoboy- added this to the 4.2.0 milestone Dec 16, 2025

Technoboy- added release/3.0.16 release/4.0.9 release/4.1.3 labels Dec 16, 2025

lhotari reviewed Dec 16, 2025

View reviewed changes

Denovo1998 suggested changes Dec 16, 2025

View reviewed changes

lhotari requested a review from Copilot December 17, 2025 10:39

Copilot started reviewing on behalf of lhotari December 17, 2025 10:40 View session

Copilot AI reviewed Dec 17, 2025

View reviewed changes

fanjianye added 3 commits December 18, 2025 11:58

fix error in delayedMessagesCount

712e2f6

use addLong

ca48bb3

change comment

9b562dc

TakaHiR07 force-pushed the fix_delayedMessagesCount_error branch 2 times, most recently from d12846f to 9b562dc Compare December 18, 2025 03:59

lhotari added release/3.0.17 release/4.1.4 release/4.0.10 and removed release/3.0.16 release/4.1.3 release/4.0.9 labels Feb 9, 2026

	roaring64Bitmap.add(entryId);
	roaring64Bitmap.addLong(entryId);

		log.error("[{}] Delayed message tracker getScheduledMessages should not < 0, number is: {}",
		dispatcher.getName(), n);

Conversation

TakaHiR07 commented Dec 16, 2025

Motivation

Modifications

Verifying this change

Documentation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TakaHiR07 Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lhotari commented Dec 16, 2025

Uh oh!

Denovo1998 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Denovo1998 commented Dec 16, 2025

Uh oh!

thetumbled commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

TakaHiR07 Dec 16, 2025 •

edited

Loading

thetumbled commented Dec 17, 2025 •

edited

Loading